Normalized sensitivity of multi-dimensional body composition biomarkers for risk change prediction

The limitations of BMI as a measure of adiposity and health risks have prompted the introduction of many alternative biomarkers. However, ranking diverse biomarkers from best to worse remains challenging. This study aimed to address this issue by introducing three new approaches: (1) a calculus-derived, normalized sensitivity score (NORSE) is used to compare the predictive power of diverse adiposity biomarkers; (2) multiple biomarkers are combined into multi-dimensional models, for increased sensitivity and risk discrimination; and (3) new visualizations are introduced that convey complex statistical trends in a compact and intuitive manner. Our approach was evaluated on 23 popular biomarkers and 6 common medical conditions using a large database (National Health and Nutrition Survey, NHANES, N ~ 100,000). Our analysis established novel findings: (1) regional composition biomarkers were more predictive of risk than global ones; (2) fat-derived biomarkers had stronger predictive power than weight-related ones; (3) waist and hip are always elements of the strongest risk predictors; (4) our new, multi-dimensional biomarker models yield higher sensitivity, personalization, and separation of the negative effects of fat from the positive effects of lean mass. Our approach provides a new way to evaluate adiposity biomarkers, brings forth new important clinical insights and sets a path for future biomarker research.

www.nature.com/scientificreports/ answered by using differential calculus 26 . We introduce the Normalized Sensitivity score (NORSE), a new measure of risk change that is based on the established mathematical tool of sensitivity analysis 27 . Note that here the term "sensitivity" is intended as the rate of change of a dependent variable with respect to an independent one 28,29 and is different from sensitivity as the True Positive Rate of a classifier 21 . Because of its focus on risk changes, the NORSE tool may be used to help people to take up healthier lifestyles and behaviors. Unlike existing detection-based approaches, NORSE is designed to measure the rate of change of a health risk (e.g., prevalence of hypertension) with respect to changes in an input biomarker (e.g., body fat mass). We do not propose new biomarkers, rather a new way of evaluating and ranking existing biomarkers based on their sensitivity of prevalence.
Traditionally, researchers have tried to create strong biomarkers by combining together simpler ones through various hand-designed formulae 22,30,31 ; This is the case for ABSI 18 and RFM 19 . In contrast, here we propose to combine multiple raw biomarkers together through joint, multi-dimensional statistical models. We discover that 2D models (association of two biomarkers with one condition) yield higher discrimination and personalization than 1D models (association of one biomarker with one condition); and that they enable the separation of the negative effects of fat mass from the positive effects of muscle mass on people's health risks.
Our results are validated on large datasets of participants (subsets of NHANES N ~ 100 K) and explained via new, compact, and intuitive visualizations.

Methods
Participants. All analyses in this study were conducted using the NHANES dataset 32 , collected by the Center for Disease Control and Prevention (CDC) between the years 1999 and 2020. The dataset comprises a total of more than 100,000 unique participants with data related to: demographics, body composition, fitness habits, eating habits and medical conditions. Our analysis focuses on the adult population only (ages between 20 and 110). The Supplementary Material available online presents a detailed accounting of the NHANES study design, participant selection, sample size and participants demographic characteristics.
Health conditions. This study considers 6 common health conditions: hypertension, diabetes, high cholesterol, arthritis, coronary heart disease and cancer (general malignancy). Being positive to a condition is assessed via participants' own answers to questions like: "Has a doctor ever told you that you have diabetes?", as defined in the NHANES protocol (see Supplementary Material). The lack of an official diagnosis likely adds noise to our results, but aggregating statistics over a relatively large number of participants mitigates that issue.

Body composition biomarkers.
This study analyzes the predictive power of 23 biomarkers, amongst which: BMI, WHR, ABSI, PBF and RFM. The full list of biomarkers and their description is in the Supplementary Material. We have organized all biomarkers into three groups: global body composition biomarkers (e.g., BMI, PBF, total body weight), composition biomarkers based on regional measurements (e.g., percent trunk fat, waist circumference, WHR), and biomarkers that are less strongly associated with body composition (e.g., standing height and leg length).
Statistical models for risk change prediction. Here we present two different types of risk change prediction models. 1D models are those where we study the association between one medical condition and one input biomarker. In 2D models, we have one medical condition and two input biomarkers. Multi-dimensional models combine multiple biomarkers using a joint statistical model, rather than trying to compress their information into a single, scalar output. Examples of 1D and 2D biomarker models are illustrated in Fig. 1. Notice that in theory, it is possible to extend our models to a dimensionality higher than 2. However, the limited amount of data in NHANES and the so called "curse of dimensionality" would yield noisier results 33 . Distribution and prevalence maps. Our biomarker models are visualized via two types of visualizations: "population distribution maps" (aka d-maps) and "condition prevalence maps" (aka p-maps). d-map. A distribution map reports the probability distribution of a population as a function of input biomarkers (Fig. 1A, C). Each cell in a d-map reports the total number of people with the biomarker B within a given range (e.g.,B ∈ [18.5, 25]), both in absolute terms (e.g., n cell = 5166 participants) and as a percentage of the total population (e.g., n cell n tot = 21.0% ). A d-map is visualized via a white-blue-purple colormap where white denotes 0% and purple denotes the maximum probability for that map.
p-map. A prevalence map reports the prevalence of a given medical condition C as a function of a biomarker B (Fig. 1B, D). We have n cell participants in a cell, out of which n cond are positive for the condition C . The cell reports the condition prevalence P C = n cond n cell as a percentage. A p-map is visualized via a grey-green-yellowred colormap, with green indicating low prevalence and vice-versa for red. In d-maps and p-maps cells with n cell < 40 or n cell n tot < 0.2% are left empty to reduce noise associated with small counts.
Sensitivity of prevalence with respect to input biomarkers. In this study we explore associations between changes in biomarkers (the independent variable B ), and the corresponding change in condition prevalence (the dependent variable P ). For this, we use derivative-based sensitivity analysis 34 . Illustrative examples are presented in Fig. S2  www.nature.com/scientificreports/ defined as the partial derivative S CB = ∂P c ∂B (in the continuous domain). The higher the S CB value, the larger the influence of the input biomarker onto the condition prevalence. More generally, B ∈ R n is an n-dimensional biomarker vector, and S CB is the associated gradient vector S CB = ∂P c ∂B 1 , ∂P c ∂B 2 , . . . , ∂P c ∂B n . Derivatives and gradients capture only local sensitivity of functions with respect to independent variables. However, our experiments show a roughly linear relationship between disease prevalence and various biomarkers, thus justifying our approach (see examples in Normalized sensitivity to predict risk changes. In general, different biomarkers have different measurement units and vary in their value ranges. For example, for the standing height biomarker, we typically have B ∈ [140, 220] cm for adults, while for BMI we have B ∈ [10, 60] kg/m 2 . To compare sensitivities of diverse biomarkers with one another we first need to map their values to a canonical range. We do so via a normalized sensitivity score (namely NORSE) which we define as follows. A biomarker B measured in our population has mean µ B and standard deviation σ B . Thus, its z-score 35 The z-score of a measurement represents its distance (in terms of number of standard deviations) from the mean. By the chain rule, the sensitivity with respect to the z-score X (i.e., the NORSE measurement N CB ) is defined as N CB = ∂P c ∂B ∂B ∂X = σ B S CB . The NORSE www.nature.com/scientificreports/ score is a unit-less number and can now be used to compare the risk predictive power of diverse biomarkers with respect to one another.
Normalized sensitivity in maps. For consistency and to aid comparisons, the length of the sides of each cell in our visualization maps are set to 1 2 σ B (see Fig. 1). The blue and brown numbers on the side of a p-map are the NORSE scores computed for each row and column, respectively. Small NORSE values ( |N CB | < 2 ) are hidden to remove noise in the visualizations. Notice how in the example in Fig. 1D the N CB 1 sensitivities (blue) are negative, while the N CB 2 ones (brown) are positive. This important effect will be discussed in detail in the results section.
All methods were performed in accordance with the relevant guidelines and regulations.
Meeting presentation. This work has not been published or presented elsewhere.

Results
Our modeling approach yielded five main new findings: (1) waist and hip circumferences used either in a ratio or within a 2D joint model yield the strongest predictive power; (2) fat-derived biomarkers have a stronger predictive power than weight-related ones such as BMI and total body weight; (3) regional body fat biomarkers are more predictive of health risks than global fat measurements; (4) 2D biomarker models produce smaller and more homogeneous cohorts than 1D ones which, in turn, leads to higher sensitivity, discrimination and personalization of health risks; and (5) 2D biomarker models help us explain the "obesity paradox" as the effect of controlling separately for fat mass and lean mass. These observations are enumerated upon in the sections that follow.
Predicting health risk changes from individual biomarkers. The NORSE scores for 23 biomarkers and 6 medical conditions for adult men and women are shown in Table 1. The last column reports NORSE scores averaged across conditions and genders. Such values are used to rank list all biomarkers. According to these results, WHR is the strongest health predictor, in the sense that normalized changes to WHR are associated with the largest changes in condition prevalence. The waist-to-thigh ratio is second, ABSI is third and RFM is fourth. BMI is in the middle of the table and total body weight lower still. Standing height www.nature.com/scientificreports/ and leg length have slightly negative NORSE scores, suggesting that tall people with long legs are statistically associated with lower health risks. Interestingly, the top performing biomarkers are all regional ones; specifically, measurements associated with abdominal fat (e.g., WHR, ABSI, RFM, PTF). In the middle of the table we have global composition biomarkers (e.g. PBF, FMI, BMI); and at the bottom, biomarkers that do not correlate much with body composition (e.g., standing height, leg length). NORSE scores were able to cluster all biomarkers into these three groups automatically. Note that PBF is the strongest of the global adiposity biomarkers.
The bottom row in the table reports column-wise average NORSE scores. Their value indicates which health conditions are "easier" to predict from individual biomarkers. In our results, hypertension shows the largest average NORSE, and cancer the lowest.
Age stratification analysis. As an example, the tables in Fig. 2 show diabetes prevalence with respect to WHR, for men and women and for different age brackets. As age increases diabetes prevalence increases, on average. The NORSE values follow a curve; they are low for young and elderly people, and they are higher in the middle. Very young people tend to have low diabetes risk even for high WHR values, and older people tend to have high prevalence, independent of WHR. People in the middle are those where changing WHR may have the greatest influence on their diabetes risk. www.nature.com/scientificreports/ Predicting health risk changes from joint, 2D biomarker models. A 2D model associates two distinct biomarkers with the prevalence of a given condition. The example in Fig. 3 shows d-maps and p-maps for X = weight, Y = waist, C = diabetes for adult men and women. Notice that when fixing the weight coordinate (e.g., 66 < weight < 76 kg for men), diabetes prevalence increases considerably (from 0.5% to 25.3%) with increasing waist. Also, for a fixed waist (e.g., 100 cm < waist < 108 cm for men), prevalence decreases (from 25.3 to 3.8%) for increasing weight. This shows two things: (1) 2D biomarker models can discriminate different levels of risk better than using only one biomarker at a time, and (2) There are cases where increases in body weight correspond to improvements in health risks. Notice how all x-sensitivities (in blue) are negative, and all y-sensitivities (in gold) are positive.
Separating the effects of abdominal fat and lean mass. The effect of reduced health risks with (apparent) increased obesity goes under the name of the "obesity paradox" 36,37 . Here we explain the negative weight-risk www.nature.com/scientificreports/ correlation by separating the negative effects of fat from the positive effects of lean mass. In our 2D model, such separation happens naturally by controlling for waist circumference. All participants within the same row have a similar waist circumference. We hypothesize that for those people, residual weight increases are mostly due to increases in lean muscle tissue, which tends to be associated with better health [38][39][40] . With this interpretation the observed prevalence trends remain explained and there is no paradox.
Risk discrimination in 2D models. Two biomarkers can be combined together by e.g. taking a ratio (as for WHR) or through a joint 2D statistical model. In the former approach, some information is lost. In fact, imagine two people, one has waist = 81 cm, hip = 90 cm and the other has waist = 108 cm, hip = 120 cm. They have the same WHR = 0.9 but very different risk levels (see Fig. S5 in Supplementary Material). Generally, Multi-dimensional models yield higher risk discrimination than 1D ones, as shown next.
Ranking 2D biomarker models based on NORSE scores and NORSE separation. Our 23 biomarkers combine into 253 valid pairs. Each pair defines a 2D model, for which we measure its NORSE scores, across two genders and 6 health conditions. NORSE scores are calculated for both biomarkers (both along the x and along the y dimensions). For many models, one of those scores tends to be strongly negative (increasing biomarker correlates with reduced risks) and the other strongly positive (increasing biomarker correlates with increased risks). We hypothesize that their difference (namely NORSE separation) relates to the model's ability to discriminate the negative effect of fat from the positive effect of lean mass. Table 2 presents results for the 10 models with the largest NORSE separation. The right-most column reports average NORSE separations across conditions and genders. Those values are used to rank all biomarker pairs. Notice that for many 2D models, their average NORSE scores are higher than those of the 1D models (max avg NORSE is < 10 in Table 1, and > 19 in Table 2). In fact, keeping the input biomarkers separate (as opposed to fusing them together into a single output) allows us to subdivide the participants population into smaller and more homogeneous cohorts, for higher risk discrimination. www.nature.com/scientificreports/ For both men and women, the largest NORSE separation is achieved by the hip-waist joint model. This confirms the power of using waist and hip circumferences for risk prediction (see Table 1).
The weight-waist 2D model. The NHANES dataset does not contain many measurements of hip circumferences (n = 2402 for men, n = 2523 for women valid measurements when intersected with C = hypertension). The pair weight-waist is amongst the best in terms of NORSE separation, but with one order of magnitude more measurements (n = 23,726 for men, n = 25,437 for women for C = hypertension). More data ensures lower measurement noise and more confident results. For that reason, our next example focuses on the weight-waist model. Figure 4 shows p-maps for C = cancer (A, B) and C = hypertension (C, D) for adult men. In panel A, for a fixed weight the cancer prevalence increases with increasing waist circumference. When fixing the waist, the prevalence decreases with increasing weight. Panel B shows the same trends even after removing smokers from our analysis.

Discussion
This study introduces a new way of assessing the strength of biomarkers as predictors of health risk changes. In contrast to AUC-ROC type techniques, here we estimate how much changes in input biomarkers affect changes in health risks. We achieve that through a new normalized sensitivity score. The results in this paper show that when used in isolation, WHR is the biomarker with the strongest "effect" (in a sensitivity sense) on the risks of common health conditions. However, a high sensitivity also means that a small error in the input measurement is likely to have a large, detrimental effect on the accuracy of the output health risk.
For example, imagine that someone has waist = 85 cm and hip = 100 cm (thus WHR = 0.85), but those quantities are measured as waist = 87.5 cm, hip = 97.5 cm. Therefore, the WHR is erroneously measured as WHR = 0.9. A 2.5 cm error on the input biomarkers translates into a 0.05 error on the output WHR, which for adult men ( Fig. 2A) translates into a large, 9% error on hypertension risk. These observations, exposed by the analyses reported herein, lead us to argue that to benefit from the increased sensitivity of our models, it is necessary to use state-of-the-art digital anthropometrics technology to increase input accuracy and thus the accuracy of risk predictions. Much literature discusses errors of measurements obtained using a measuring tape for example [41][42][43] . Recent progress in computer vision and photogrammetry offers accurate and inexpensive tools for measuring body composition and anthropometrics through optical scanners or even conventional smartphones 44-50 .

Limitations.
Limitations of the analysis presented here include: examination of cross-sectional data only, no longitudinal studies; establishing statistical associations rather than mechanistic understanding of cause and effect; lack of an official diagnosis for health conditions with reliance only on participants self-reported answers to a questionnaire; limited population size; treating diabetes as a single condition without distinction between type I and type II (by far the most common); and use of disease prevalence as a proxy for health risks.

Conclusions
This study advances a new way of estimating the power of different body composition biomarkers when predicting health risk changes. Our results indicate that waist and hip circumferences, either used in a ratio or in a joint 2D model, hold the strongest predictive power. In general, regional body composition biomarkers produce the best results. We also show how joint biomarker models provide further resolution, prediction accuracy and the possibility to separate the negative effects of body fat from the positive effects of muscle mass. Our joint models help explain the "obesity paradox" via conventional statistical analysis.
We believe that our findings will lead to a better understanding of obesity, its causes and its effects on people's health. Also, focusing on sensitivity measures may help individuals understand what behavior changes affect their health the most, and embrace healthier habits. Finally, combining our findings with emerging technology for body scanning and anthropometrics measurements promises to advance the way we assess obesity and associated health risks for everyone.

Data availability
The data used in this study can be downloaded from the Center for Disease Control and Prevention website at https:// www. cdc. gov/ nchs/ nhanes/ index. htm.